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Abstract 

We introduce the blind subspace deconvolution (BSSD) problem, which is the extension of both the blind 
source deconvolution (BSD) and the independent subspace analysis (ISA) tasks. We examine the case of 
the undercomplete BSSD (uBSSD). Applying temporal concatenation we reduce this problem to ISA. The 
associated 'high dimensional' ISA problem can be handled by a recent technique called joint f-decorrelation 
(JFD). Similar decorrelation methods have been used previously for kernel independent component analysis 
(kernel-ICA). More precisely, the kernel canonical correlation (KCCA) technique is a member of this family, 
and, as is shown in this paper, the kernel generalized variance (KGV) method can also be seen as a decor- 
relation method in the feature space. These kernel based algorithms will be adapted to the ISA task. In 
the numerical examples, we (i) examine how efficiently the emerging higher dimensional ISA tasks can be 
tackled, and (ii) explore the working and advantages of the derived kernel-ISA methods. 
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3 1 Introduction 

l> ■ 

Independent component analysis (ICA) [H [2] aims to recover linearly or non-linearly mixed independent and 
hidden sources. There is a broad range of applications for ICA, such as blind source separation, feature 
& extraction and denoising. Particular applications include the analysis of financial and neurobiological data, 
£h fMRI, EEG, and MEG. For recent review concerning ICA the reader is referred to the literature [31 0]. 

Traditional ICA algorithms are one-dimensional in the sense that all sources are assumed to be independent 
real valued random variables. Nonetheless, applications in which only certain groups of sources are independent 
may be highly relevant in practice. In this case, the independent sources can be multidimensional. For instance, 
consider the generalization of the cocktail-party problem, where independent groups of people are talking about 
independent topics or more than one group of musicians is playing at the party. The separation task requires an 
extension of ICA, which can be called independent subspace analysis (ISA) [5], multidimensional independent 
component analysis (MICA) [6], and group ICA [7]. We will use the first of these abbreviations throughout 
this paper. 

Strenuous efforts have been made to develop ISA algorithms [SI |H1 ISl IS1 QUI UJ1 H21 1131 QSl IZl US1 US1 UZl UB] ■ 
For the most part, ISA-related theoretical problems concern the estimation of entropy or of mutual information. 
For this, the fc-nearest neighbors [13J and the geodesic spanning tree methods [H] can be applied. Other recent 
approaches seek independent subspaces via kernel methods [11] and joint block diagonalization [7] [To]. 

Another extension of the original ICA task is the blind source deconvolution (BSD) problem. Such a problem 
emerges, for example, at a cocktail-party being held in an echoic room. Several BSD algorithms were developed 
in the past. See, for example, the review of [19] . Like ICA, BSD has several applications: (i) remote sensing 
applications; passive radar/sonar processing [201 [21], (ii) image-deblurring, image restoration [22], (iii) speech 
enhancement using microphone arrays, acoustics [23j , 1241 25, 26j, (iv) multi- antenna wireless communications, 
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sensor networks [23 EH], (v) biomedical signal— EEG, ECG, MEG, fMRI— analysis (2HJ EH3 EE] , (vi) optics [32], 
(vii) seismic exploration [33] . 

The simultaneous assumption of the two extensions, that is, ISA combined with BSD, seems to be a more 
realistic model than either of the two models alone. For example, at the cocktail-party, groups of people 
or groups of musicians may form independent source groups and echoes could be present. This task will be 
called blind subspace deconvolution (BSSD). We treat the undercomplete case (uBSSD) here. In terms of the 
cocktail-party problem, it is assumed that there are more microphones than acoustic sources. Here we note that 
the complete, and in particular the overcomplete, BSSD task is challenging and as of yet no general solution 
is known. We can show that temporal concatenation turns the uBSSD task into an ISA problem. One of the 
most stringent applications of BSSD could be the analysis of EEG or fMRI signals. The ICA assumptions could 
be highly problematic here, because some sources may depend one another, so an ISA model seems better. 
Furthermore, the passing of information from one area to another and the related delayed and transformed 
activities may be modeled as echoes. Thus, one can argue that BSSD may fit this important problem domain 
better than ICA or even ISA. 

In principle, the ISA problem can be treated with the methods listed above. However, the dimension of 
the ISA problem derived from an uBSSD task is not amenable to state-of-the-art ISA methods. According 
to a recent decomposition principle, the ISA Separation Theorem |34| . the ISA task can be divided into two 
consecutive steps under certain conditions: after the application of the ICA algorithm, the ICA elements need 
to be grouped]^] The importance of this direction stems from the fact that ICA methods can deal with problems 
in high dimensions. The derived ISA task will be solved with the use of the decomposition principle augmented 
by the joint f-decorrelation (JFD) technique |17j . 

We show other ISA approaches beyond the JFD method: We adapted the kernel canonical correlation 
analysis (KCCA) and the kernel generalized variance (KGV) methods [36] to measure the mutual dependency 
of multidimensional variables. One can show that similarly to the JFD and the KCCA methods, the KGV 
technique deals with nonlinear decorrelation in function spaces. We found that they can be more precise but 
are limited to smaller problems. 

The paper is structured as follows: Section E] formulates the problem domain. Section [3] shows how to 
transform the uBSSD task into an ISA task. The JFD method, which we use to solve the derived ISA task, is 
the subject of Section HI This section also addresses how to tailor the KCCA and KGV kernel-ICA methods to 
solve the ISA problem. Section [5] contains the numerical illustrations and conclusions are drawn in Section (6[ 



The BSSD task and its special case, the ISA model, are defined in Section [2TTT Section [231 details the ambiguities 
of the ISA task. Section f2T3l introduces some possible ISA cost functions. 

2.1 The BSSD Equations 

Here, we define the BSSD task. Assume that we have M hidden, independent, multidimensional components 
(random variables). Suppose also that only their casual FIR filtered mixture is available for observation!! 



1=0 

where s(t) = [s 1 (t); . . . ; s M (i)] S M. Md is a vector concatenated of components s m (t) E M. d . For a given m, s m (t) 
is i.i.d. (independent and identically distributed) in time t, s m s are non-Gaussian, and /(s 1 , . . . , s ) = 0, where 

lr The possibility of such a decomposition principle was suspected by [6], who based his conjecture on numerical experiments. To 
the best of our knowledge, a proof encompassing sufficient conditions for this intriguing hypothesis was first published by [35] . 
2 Causal: I > in J^. FIR: the number of terms in the sum is finite. 
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I stands for the mutual information of the arguments. The total dimension of the components is D s := Md, 
the dimension of the observation x is D x . Matrices G R D ** D ° (/ = 0, . . . , L) describe the mixing, these are 
the mixing matrices. Without any loss of generality it may be assumed that E[s] = 0, where E denotes the 
expectation value. Then E[x] = holds, as well. The goal of the BSSD problem is to estimate the original 
source s(i) by using observations x(t) only. 

The case L = corresponds to the ISA task, and if d = 1 also holds then the ICA task is recovered. In 
the BSD task d = 1 and L is a non-negative integer. D x > D s is the undercomplete, D x = D s is the complete, 
and D x < D s is the overcomplete task. Here, we treat the undercomplete BSSD (uBSSD) problem. We will 
transform the uBSSD task to undercomplete ISA (uISA) or to complete ISA. Prom now on they both will be 
called ISA. 

Note 1 Mixing matrices Hj (0 < I < L) have a one-to-one mapping to polynomial matri^ 
H[z] := J^iLo^-i 2 ^ 1 ^ R[z] DxXDs , where z is the time-shift operation, that is (z _1 u)(i) = u(t — 1). H[z] may 
be regarded as an operation that maps D s - dimensional series to D x - dimensional series. Equation ([1]) can be 
written as x = H[z]s. 

Note 2 It can be shown J37| / that in the uBSSD task H[z] has a polynomial matrix left inverse W[z] G 
M.[z] DxXDs with probability 1, under mild conditions. In other words, for these polynomial matrices W[z] 
and H[z], W[z]H[z] is the identity mapping. The mild condition is as follows: Coefficients of polynomial ma- 
trix H[z], that is, random matrix [Ho; . . . ;Hl] is drawn from a continuous distribution. Under this condition, 
hidden source s(t) can be estimated by a suitable causal FIR filtered form of observation x(i). 

For the uBSSD task it is assumed that H[z] has a polynomial matrix left inverse. For the uISA and ISA 
tasks it is supposed that mixing matrix Ho G ^ DxXDs has full column rank, that is its rank is D s . 

2.2 Ambiguities of the ISA Model 

Because the uBSSD task will be reduced to ISA, it is important to see the ambiguities of the ISA task. First, 
the complete ISA problem (L = 0, D x = D s ) is presented, the undercomplete ISA will be treated later. 

The identification of the ISA model is ambiguous. However, the ambiguities are simple [38J: hidden 
multidimensional components can be determined up to permutation and up to invertible transformation within 
the subspaces. Ambiguities within the subspaces can be weakened. Namely, because of the invertibility of 
mixing matrix H[z) = H G R DsXDs , it can be assumed without any loss of generality that both the sources 
and the observation are white, that is, 

E[s] = 0,cov [s] = l De , 
E[x] = 0,cov [x] = I Dx , 

where Id s is the D s -dimensional identity matrix and cov is the covariance matrix. It then follows that the 
mixing matrix Ho and thus the demixing matrix W = Hq 1 are orthogonal: 

I Ds = cov[x] = E [xx*] = H £ [ss*] H^ = H I Ds HS = H H^, 

where * denotes transposition. In sum, Ho, W G Ds , where Ds denotes the set of L> s -dimensional orthogonal 
matrices. Now, s m sources are determined up to permutation and orthogonal transformation. 

In order to transform the undercomplete ISA task into a complete ISA task with white observations let 
C := cow fx! = Mxx* ] = H Hj5 G R D * xD * denote the covariance matrix of the observation. Rank of C is Da, 
since the rank of matrix Ho is D s according to our assumptions. Matrix C is symmetric (C = C*), thus it 

3 H[z] is also known as channel matrix or transfer function in the literature. 
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can be decomposed as follows: C = UDU*, where U G R lX s , and the columns of matrix U are orthogonal, 
that is, U*U = Id s - Furthermore, the rank of diagonal matrix D G M DsXDs is D s . The principal component 
analysis can provide a decomposition in the desired form. Let Q := D -1 / 2 U* G M. DsXDx . Then the original 
observation x can be modified to x' := Qx = QHos G MP a . The resulting x' is white and can be regarded as 
the observation of a complete ISA task having mixing matrix QHo G 0^*" . 

2.3 ISA Cost Functions 

After the whitening procedure (Section I2.2[) . the ISA task can be viewed as the minimization of the mutual 
information between the estimated components on the orthogonal group: 

J / (W):=7(y 1 ,...,y M ), (2) 

where y = Wx, y = [y 1 ; . . . ; y M ] , y m G R d , and W G O d . This formulation of the ISA task serves us m 
Section 14.11 where we estimate the dependencies of the multidimensional variables. 

The ISA task can be rewritten into the minimization of the sum of Shannon's multidimensional differential 
entropies [13] : 

M 

J H (W):=Y,H(y m ), (3) 

m=l 

where y = Wx, y = [y 1 ; . . . ; y M ] , y m G R d , W G O d . 

Note 3 Until now, we formulated the ISA task by means of the entropy or the mutual information of multidi- 
mensional random variables, see Equations ([2]) and ([3|) . However, any algorithm that treats mutual information 
between 1-dimensional random variables can also be sufficient. This statement is based on the considerations 
below. Well-known identities of mutual information and entropy expressions 1391/ show that the minimization 
of cost function 

M d M 

Jn,im ■■= EEw - E • • • .iff). 

m=l i=l m=l 

or that of 

M 

Jjj(W) :=l(yl,..., yf) - £ I (y?, y^) 

771=1 

can also solve the ISA task. Here, y = Wx is the estimated ISA source, where x G MP is the whitened 
observation in the ISA model. W G D is the estimated ISA demixing matrix, and in y = [y 1 ; . . . ;y A/ ] G R D 
the y m G M. d , m = 1,... ,M, represent the estimated components with coordinates yf 1 G BL The first term 
of both cost functions Jh,i o,nd Jj j is an ICA cost function. Thus, these first terms can be fixed by means 
of ICA preprocessing^ In this case, if the Separation Theorem holds (for details see Section \3.2\) . then term 
X^m=l I(yT-> ■ ■ ■ i Vc?) i m pti es that the maximization of the sum of mutual information between 1-dimensional 
random variables within the subspaces is sufficient for solving the ISA task. 

3 Reduction Steps 

Here we show that the direct search for inverse FIR filter can be circumvented (Note [2]) . Namely, temporal 
concatenation reduces the uBSSD task to an (u)ISA problem (Section 13 . X [) . Our earlier results will allow 
further simplifications. We will reduce the ISA task to an ICA task plus a search for optimal permutation of 

4 From the algorithmic point of view, any ICA algorithm that minimizes cost function I(yl, . . . , y^ 1 ) suits the ICA preprocessing 
step. 
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the ICA coordinates. This decomposition principle will be elaborated in Section [321 by means of the Separation 
Theorem. 



3.1 Reduction of uBSSD to (u)ISA 



We reduce the uBSSD task to an ISA problem. The BSD literature provides the basis for our reduction; |40j 
use temporal concatenation in their work. This method can be extended to multidimensional s m components 
in a natural fashion: 
Let L' be such that 

D x ti > D S {L + L') (4) 
is fulfilled. Such V exists due to the undercomplete assumption D x > D s : 



L' > 



D x -D s 



(5) 



This choice of L' guarantees that the reduction gives rise to an (under) complete ISA task: let x m (t) denote the 
m th coordinate of observation x(i) and let the matrix G ^ D ^ xMd be decomposed into 1 x d sized blocks. 



That is, = | 
Using notations 



3 - 3 -l H=l..D x ,j=l.,M 



(H 6 



l lxd ), where i and j denote row and column indices, respectively. 



S m (t) := [ 


s m (i); 


s m (t- l);...;s m (t- 


- (L + V) + 1)] G R d ( L+L ') 


X m (t) := [ 


•Em (t) 


•Em !))•••) •Em 


-L' + £l L ', 


m ■■= i 


s\ty, 


■ S M (t)} e ^ Md ( L + L ')= D s( L + L ') 


X(t) := [ 




...;X D *(t)} eR DxL 
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... II)' ... 







A ij : = 








e E i'xd(i+i') s 







... H# ... 






A:=[ 




l..D w j=l..M€K D ' L '* 


Md{L+L')=D x L' xD s {L+L') 



model 

X(t) = AS(t) (6) 

can be obtained. Here, s m (t)s are i.i.d. in time t, they are independent for different m values, and Equation 
holds for V . Thus, (|6|) is either an undercomplete or a complete ISA task, depending on the relation of the 
l.h.s and the r.h.s of (|4|): the task is complete if the two sides are equal. The number of the components and 
the dimension of the components in task ([6]) are M(L + L') and d, respectively. 

If we end up with an undercomplete ISA problem in ([6]) then it can be reduced to a complete one, as was 
shown in Section [2.2i Thus, choosing the minimal value for L' in ([5]), the dimension of the obtained ISA task 
is 



Asa :=D s {L + L')=D s [L + 



D,L 



D~ - D, 



(7) 



Taking into account the ambiguities of the ISA task (Section l2.2p . the original s m components will occur L + L' 
times and up to orthogonal transformations. As a result, in the ideal case, our estimations are as follows 



G 



where k = 1, . . . , L + L', Ff G d . 
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3.2 Reduction of ISA to ICA 



The Separation Theorem |34j conjectured by [6] allows one to decompose the solution of the ISA problem, under 
certain conditions, into 2 steps: In the first step, ICA estimation is executed by minimizing I(y\, ■ ■ ■ ,y¥)- In 
the second step, the ICA elements are grouped by finding an optimal permutation. This principle will be 
formalized in Section 13.2.11 Section 13.2.21 provides sufficient conditions for the theorem. 

3.2.1 The ISA Separation Theorem 

We state the ISA Separation Theorem for components having possibly different d m dimensions: 

Theorem 1 (Separation Theorem for ISA) Let y = [yi; ■ ■ ■ ;vd] = Wx G m D , where W G O d , x G 
M. D is the whitened observation of the ISA model, and D = X^m=i ^m* Let § dm denote the surface of the 
d m - dimensional unit sphere, that is § dm := {w G M. dm : Yli=i w t = !}• 

Presume that the u := s m G M. dm sources (m = 1, . . . , M) of the ISA model satisfy condition 

(dm \ dm 

Y w m z w f H > Vw G §dm > ( 8 ) 
i=l J i=l 

and that the ICA cost function Jjca(W) = Y2i=i H-(Vi) has minimum over the orthogonal matrices in Wjca- 
Then it is sufficient to search for the solution to the ISA task as a permutation of the solution of the ICA task. 
Using the concept of demixing matrices, it is sufficient to explore forms 

Wisa = PWica, 

where P G M. DxD is a permutation matrix to be determined and Wisa is the ISA demixing matrix. 

The proof of the theorem is presented in Appendix |A] It is intriguing that if (jSJ) is satisfied then the simple 
decomposition principle provides the global minimum of ([2J- In the literature on joint block diagonalization 
(JBD) [5T] have put forth a similar conjecture recently. According to this conjecture, for quadratic cost function, 
if Jacobi optimization is applied, the block-diagonalization of the matrices can be found by the optimization 
of permutations following the joint diagonalization of the matrices. ISA solutions formulated within the JBD 
framework [3 [T5l [IB] [T7] make efficient use of this idea in practice. [161 could justify this approach for local 
minimum points. 

3.2.2 Sufficient Conditions of the ISA Separation Theorem 

The question of which types of sources satisfy the Separation Theorem is open. Equation ([8]) provides only 
a sufficient condition. Below, we list sources s m that satisfy ([8]). Details and the extension of the Separation 
Theorem for complex variables can be found in a technical report of |34j . 

1. Assume that variables u = s m satisfy the so-called w-EPI condition (EPI is shorthand for the entropy 
power inequality [39j), that is, 

d 
i=l 

Then inequality ([8]) holds for these variables too. The proof can be found in Lemma [T] of Appendix lAl 

2. The © w-EPI condition is valid 
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(a) for spherically symmetric or shortly spherical variables [42| . The distribution of such variables is 
invariant for orthogonal transformations^ Sketch of the proof (u = s m ): the w-EPI condition 
concerns projections to unit vectors. For spherical variables, the distribution and thus the entropy 
of these projections are independent of w € § d . Because e 2H ( w i u i) = e 2H i u i)yJj anc l w g the 
w-EPI is satisfied with equality Vw G § d . □ 

(b) for 2-dimensional variables invariant to 90° rotation. Under this condition, density function h of 
component s m is subject to the following invariance 

h(u\,u 2 ) = h(-u 2 ,ui) = h(—ui, -u 2 ) = h(u 2 , -u\) (Vu G M 2 ) . 

Sketch of the proof (u = s m ): Assume that function / : S 2 3 w i— > H ^X/i=i w i u ij has global 

minimum on set S 2 H {w > 0}H Let this minimum be at w m G M 2 . Then, the 90° invariance 
warrants that function / take its global minimum also on G R 2 , which is perpendicular to w m . 
Let (C m )* = [w m ,w^] G O 2 . Now, we can estimate variables C m s m . This is sufficient because the 
ISA solution is ambiguous up to orthogonal transformations within each subspace. □ 
A special case of this requirement is invariance to permutation and sign changes 

h(±ui,±u 2 ) = h(±u 2 ,±ui). 

In other words, there exists a function g : R 2 — > R, which is symmetric in its variables and 

h(u) = g(\ux\, \u 2 \). 

Special cases within this family are distributions 




ip > 0), 



which are constant over the spheres of L p -space. They are called L p spherical variables which, for 
p = 2, corresponds to spherical variables. 

(c) for certain weakly dependent variables: [33] has determined sufficient conditions when EPI holds|j| If 
the EPI property is satisfied on unit sphere § d , then the ISA Separation Theorem holds (Lemma [1]). 

These results are summarized schematically in Table [TJ 



4 ISA Methods 

We showed how to convert the uBSSD task to an ISA task in Section 13. 1L In the following we will present 
methods that can solve the ISA task. In Section [4^1] we treat estimations of the mutual information of the ISA 
cost functions in Section T2.31 Methods that can optimize these cost functions are elaborated in Section T4.21 We 
also present here the pseudocode of the procedures studied. In Section 14.31 we review methods that can treat 
non-equal or unknown component dimensions. In what follows, and in accordance with (pQ), let x G M> D denote 
the whitened observation, while y = [y 1 ; . . . ; y M ] = Wx G R D (W G D ) and y m G R d (m = 1, . . . , M) stand 
for the estimated source and its components in the ISA task, respectively. 

5 In the ISA task the non-degenerate affine transformations of spherical variables, the so called elliptical variables, do not provide 
valuable generalizations due to the ambiguities of the ISA task. 
6 Relation w > concerns each coordinate. 

7 The constraint of d — 2 may be generalized to higher dimensions. We are not aware of such generalizations. 
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invariance to 90° rotation (d = 2) 

specially 



Takano's dependency 
(d = 2) 



w- 



;pi- 



invariance to sign and permutation 

specially 

LP spherical (p > 0) 
t 

generalization for d = 2 

^=^= spherical symmetry 



Equation ([8]): sufficient 
for the ISA Separation Theorem 

Table 1: Sufficient conditions for the ISA Separation Theorem. 



4.1 Dependency Estimations 



Here we introduce two dependency estimators. First, in Section [4.1.11 we describe a decorrelation method that 
uses a set of functions jointly. This method is called joint f-decorrelation (JFD) method [17] ■ Our second 
technique (Section I4.1.2P generalizes earlier kernel-ICA methods for the ISA task. The motivation for this 
latter method is the efficiency and precision of kernel-ICA methods in finding independent components [36J. 
Our experiences are similar with kernel-ISA methods, see Section T5.31 We found that kernel-ISA methods need 
more computations, but can provide more precise solutions than the JFD technique. 



4.1.1 The JFD Method 

The JFD method estimates the hidden s m components through the decorrelation over a function set 5(3 f) 
|17j . Formally, let the empirical f-covariance matrix of y(t) and y m (t) for function f = [f 1 ;... ; f M ] € 3" over 
t = 1, . . . , T be denoted by 

E(f,T,W) = ca& [f(y),f(y)] = 

T ( T *\ ( T 

= T E f - t E f frW] f ^)] - \ E f ^ 

t=l l k=l J l k=l 

£^'(f,T, W) = cov [r (y*) ,P (yJ')] = 

T ( T ~\ ( T 

= ^E f V(*)i -jEWi WW - ^E fJ [y J ( fc )] 

t=l l k=l J l k=l 

Then, the joint decorrelation on 7 can be formulated as the minimization of cost function 

Jjfd(W) :=El|NoS(f,r,W)||^. (10) 
fe3~ 

Here: (i) W E D , (ii) denotes a set of M D — > M D functions, and each function acts on each coordinate 
separately, (hi) o denotes the point- wise multiplication, called the Hadamard-product, (iv) N masks according 
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to the subspaces, N := — 1m (8> E<j, where all elements of matrix Ed G R DxD and E^ G R dxd are equal to 1, 
<g> is the Kronecker-product, (v) ||-||^, denotes the square of the Frobenius norm, that is, the sum of the squares 
of the elements. 

Cost function (jlOp can be interpreted as follows: for any function f m : R rf — > R rf that acts on independent 
variables y m (m = 1, ...,M) the variables f m (y m ) remain independent. Thus, covariance matrix S(f, T, W) 
of variable f(y) = [f 1 (y 1 ) ; . . . ; f (y )] is block-diagonal. Independence of estimated sources y m is gauged 
by the uncorrelatedness on the function set Thus, the non-block-diagonal portions (S* ,:, (f, T, W), i ^ j) of 
covariance matrices £(f, T, W) are punished. This principle is expressed by the term ||N o S(f, T, W)|j^. 



4.1.2 Kernel-ISA Methods 

Two alternatives for the ISA cost function of ()10p are presented. They estimate the mutual information based 
ISA cost defined in ([2]) via kernels: the KCCA and KGV kernel-ICA methods of [36J are extended to the 
ISA task. The original methods estimate pair-wise independence between 1- dimensional random variablesll 
The extension to the multidimensional case is straightforward, the arguments of the kernels can be modified 
to multidimensional variables and the derivation of [36] can be followed. The main steps are provided below 
for the sake of completeness. The resulting expressions can be used for the estimation of dependence between 
multidimensional random variables. The performance of these simple extensions on the related ISA applications 
is shown in Section 15.3.21 



The KCCA Method First, the 2-variable-case is treated and then it will be generalized to many variables. 

2- variable-case Assume that the mutual dependence of two random variables u G R rfl and v G R rf2 has to 
be measured. Let positive semi- definite kernels k u {-, •) : R dl x R dl -> R, and A; v (-, •) : R d2 x R d2 — > R be chosen 
in the respective spaces. Let 3~ u and 3^ denote the reproducing kernel Hilbert spaces (RKHS) [SI HSJ H6] 
associated with the kernels. Here, GF™ and 3^ are function spaces having elements that perform mappings 
R dl — > R and R d2 — > R, respectively. Then the mutual dependence between u and v can be measured, for 
instance, by the following expression: 



j kcca( u > v,^,^) := sup corr[#(u),/i(v)], 

where corr denotes correlation. 

The value of «/kcca can ^ e estimated empirically: assume that we have T samples both from u and 

T 

from v. These samples are ux, . . . , uj- G M. dl and vi, . . . , Vt G R^ 2 . Then, using notations g := A ^ <?(ufc), 

k=l 

T 

h := y 5( v fc)i the empirical estimation of Jkcca COUi d be the following: 



k=l 



T *,emp i rru rrvN .. . T Et=lb( U *) ~ SP( V t) ~ h \ 



However, it is worth including some regularization for Jj^ccA @3h therefore J^cca ^ s m odified to 

Jkcca(u,v,^,^):= sup COVb(u) ' Mv)] (11) 



„e3» fteS* / var ^(u)] + K || 5 ||^ Jvar [fc(v)] + k \\h\\^ 



We note that if our observations are generated by an ISA model then — unlike in the ICA task when d = 1 — pairwise inde- 
pendence is not equivalent to mutual independence [2j [14] . Nonetheless, according to our numerical experiences it is an efficient 
approximation in many situations. 
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i 1 1 2 1 1 1 1 2 

where expression 'var' stands for variance, k > is the regularization parameter, \\-\\^a and ||.||gv denote the 
RKHS norm of their arguments in 3 ru and 3^, respectively. Now, expanding the denominator up to second 
order in n, setting the expectation value of the samples to zero in the respective RKHSs, and using the notation 
k 2 := [36], the empirical estimation of (fTT|) is 

47 CA (u,v,J u ,in= sup ctK"Kv C2 (i2) 

Cl£RT ' C2eRT ft (k U + . 2 I t ) 2 Ci ^(kv + . 2 I t ) 2 c 2 

where K u , K v € M TxT are the so-called centered kernel matrices: These matrices are derived from kernel 
matrices K u = [fc(uj, u :/ -)]jj = i v .. j T, K v = [/c(vj, ^j)]i,j=i,...,T G M TxT , as is described below. Let It G M T 
denote a vector whose all elements are equal to 1 and let H := It — ^It^-t G K TxT denote the so-called 
T-dimensional centering matrix. Then K u := HK U H, K v := HK V H. 

Computing the stationary points of J^cca ^ n CG]); that is, setting = — %^. CA , the resulting task is to solve 
a generalized eigenvalue problem of the form C£ = //D£: 

(KM- k 2 I t ) 2 ^ K U K V \ /cA = 7(K U + k 2 It) 2 \ /cA 

K v K u (kM-k 2 It)7 W l+7J ^ (K v + k 2 It) 2 J V c 2/ ' 

where the objective is to maximize 7 := c^K u K v c 2 . Our task is to estimate </kcca> the maximum of 7. 



Generalization for many variables The KCCA method can be generalized for more than two random 
variables and can be used to measure pair-wise dependence: Let us introduce the following notations: Let 
y 1 € M. dl , . . . ,y M € M. dM be random variables. We want to measure the dependence between these variables. 
Let positive semi-definite kernels k m (-,-) : W dm x M. dm — > M. (m = 1,...,M) be chosen in the respective 
spaces. Let J m denote the RKHS associated with kernel k m (-, •). Having T samples y™, . . . , for all random 
variables y m (m = 1,...,M), matrices K m := [k m {yf, yf)]i,j=i,...,T G K TxT and K m := HK m H G M TxT 
can be created. Let the regularization parameter be chosen as At > and let k 2 denote the auxiliary variable 
k 2 := 4^. It can be proven that the computation of ^kcca involves the solution of the following generalized 
eigenvalue problem: 



/(K 1 + k 2 1 t ) 2 K X K 2 

K 2 K* (K 2 + k 2 I t ) 2 



K 1 ^ \ /ci\ 

K 2 K M c 2 



\ K^K 1 



K M K 2 ••• (K a/ + k 2 I t ) 2 / 
/(K 1 + k 2 I t ) 2 



\c M J 



V 











(K 2 + K 2 I) 












\ / c l\ 

c 2 



(K M + K2 I) 2 / 



(13) 



Analogously to the two- variable-case, the largest eigenvalue of this task is a measure of the value of the pair-wise 
dependence of the random variables. 



The KGV Method Equation (|2j) in Section T2.3I indicates that the ISA task can be seen as the minimization 
of the mutual information. The basic idea of the KGV technique is that — even for non-Gaussian variables — 
it estimates the mutual information by the Gaussian approximation [36]. Namely, let y = [y 1 ;...^ ] be 
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multidimensional normal random variable with covariance matrix C. Let C % ' J G M. dmXdm denote the cross- 
covariance between components of y m E M rfm . The mutual information between components y 1 ,...,^"^ is 



/(yl y M) = _Iw( d6tC 

^ '•••' y } 2^0^ detC 



dctC 



The quotient 

n~=idrtC" 



is called the generalized variance. If y is not normal — this is the typical situation 



in the ISA task — then let us transform the individual components y m using feature mapping ip associated with 
the reproducing kernel and assume that the image is a normal variable. Thus, the cost function 



</kgv(W) := --log 



det(TC) 



rC=idet(XT 



(14) 



is associated with the ISA task. In Equation p4|) 4>(y) := [(^(y 1 ); . . . ; ip(y M )], 7C := cov[<p(y)], and the 



sub-matrices are 7C' J = cov[ip(y l ), tp(y J )]. Expression 



dct(/C) 



m U det(K m ' m ) 



is called the kernel generalized variance 



(KGV). 

The next theorem shows that the KGV technique can be interpreted as a decorrelation based method: 

Theorem 2 Let £ G W DxD be a positive semi-definite matrix, let £ m,m g ygdmxdm d eno f e fa e m th frfo^ ^ n f^ e 
diagonal of matrix S, and let D = YLm=i ^m- Then the function 

det(S) 



< Q(S) 

is iff Z = blockdiagiT, 1 ' 1 , . . . ,E M ' M ). 



-- log 

2 8 



nt f =1 det(^ 



This theorem can be proven for d m > 1, as in the case of d m = 1 [39], see the work of [T7J. The theorem 
implies the following: 

Corollary Setting S := /C, t/ie i^GF technique is a decorrelation technique according to feature mapping (p. 
The KGV technique aims at minimizing of cross- covariances K. l ' J = cov [<p(y l ), fiy 3 )] to 0. 

We note that the kernel covariance (KC) ICA method [38] — similarly to the KCCA method — can be ex- 
tended to measure the mutual dependence of multidimensional random variables and thus to solve the ISA 
task. Again, the computation of the cost function can be converted to the solution of a generalized eigenvalue 
problem. This eigenvalue problem is provided in Appendix iBl for the sake of completeness. 

Note 4 The KCCA, KGV and KC methods can estimate only pair-wise dependence. Nonetheless, the joint 
mutual information can be estimated by recursive methods computing pair-wise mutual information: for the 
mutual information of random variables y m G M. dm (m = 1, . . . , M) it can be shown that the recursive relation 

M 

i(y\..., y ^ = £/(y^[y m+ \...,y M ]) (15) 

m=l 

holds [39]. Thus, for example, the KCCA eigenvalue problem of (fT3j) can be replaced by pair-wise estimation of 
mutual information. We note that the tree- dependent component analysis model /71]/ estimates the joint mutual 
information from the pair-wise mutual information. 
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Input of the algorithm 




ISA observation: {x.(t)}t=x t 




Optimization^ 




ICA: on the whitened observation x 


=> sica estimation 


Permutation search 




P :=I D 




repeat 




sequentially for Vp E S mi , q 




if J(P pq P) < J(P) 




P — P P 




end 




until J(-) decreases in the sweep 


above 


Estimation 




SlSA = PsiCA 





Table 2: Pseudocode of the ISA Algorithm. Cost J stands for the ISA cost function of JFD, KCCA, or KGV 
methods. The permutation matrix of the ISA Separation Theorem is the variable of J. 

4.2 Optimization of ISA Costs 

There are several possibilities to optimize ISA cost functions: 

1. Without ICA preprocessing, optimization problems concern either the Stiefel manifold |491 1501 f5T |, 152] or 
the flag manifold [53J. According to our experiences, these gradient based optimization methods may be 
stuck in poor local minima. 

2. According to the ISA Separation Theorem, it may be sufficient to search for optimal permutation of the 
ICA components provided by ICA preprocessing. We applied greedy permutation search: two coordinates 
of different subspaces are exchanged provided that this change decreases cost function J. Here, J denotes, 
for example, Jjfd, ^kcca, or Jkgv depending on the ISA technique applied. The variable of J is the 
permutation matrix P using the parametrization Wisa = PWtqa- Pseudocode is provided in Table [2j 
Our experiences show that greedy permutation search is often sufficient for the estimation of the ISA 
subspaces. However, it is easy to generate examples in which this is not true [14J. In such cases, global 
permutation search method of higher computational burden may become necessary |35j . 

4.3 Different and Unknown Component Dimensions 

Here we give a quick overview how one can handle situations when the dimensions of the subspaces are unequal, 
unknown, or both. Note that the introduced uBSSD-ISA reduction, the ISA ambiguities [16] and the Separation 
Theorem remain the same for subspaces of different dimensions, and thus it is sufficient to consider the ISA 
problem. 

1. If the dimensions of the subspaces are different but known, the ISA task can be solved 

(a) the mask N of the JFD method should be modified (see Equation HOD . 

(b) the kernel-ISA methods include this situation; they were introduced for different subspace dimen- 
sions. 

9 Let 9 1 , • • • > S M denote the indices of the I s * , . . . , M th subspaces, that is, S m := {(m — l)d+ 1, . . . , md}, and permutation matrix 
P P q exchanges coordinates p and q. 
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2. If the dimension of the hidden source s is known, but the individual dimensions of components s m are 
not, then clustering can exploit the dependencies between the coordinates of the estimated sources. For 
example: 

(a) If we assume that the hidden s source has block diagonal AR dynamics of the form s(f + 1) = 
Fs(t) + e(t) — F = blockdiag (F 1 , . . . ,F M ) — then connectivity of the estimated matrix F may help 
|54j . One may assume that the i and the j coordinates are 'F-connected' if value maxUFjjl, \Fji\} 
is above a certain threshold. 

Similar considerations can be applied in the ISA problem, for example, by using cumulant based 
matrices [16| . 

Weaknesses of the threshold based method include (i) the uncertainty in choosing the threshold, 
and (ii) the fact that the methods are sensitive to the threshold. More robust solutions can be 
designed if the dependencies, for example, the mutual information amongst the coordinates, are 
used to construct an adjacency matrix and apply a clustering method for this matrix. One might 
use, for example, hierarchical [12j or tree-structured clustering methods [llj . 



5 Illustrations 

The efficiency of the algorithms of Section0]are illustrated. Test cases are introduced in Section [5TT1 The quality 
of the solutions will be measured by the normalized Amari-error, the Amari- index (Section 15 . 2() . Numerical 
results are presented in Section 15.31 



5.1 Databases 

We define five databases to study our identification algorithms. We do not know whether they satisfy (|8|) or 
not. According to our experiences, the ISA Separation Theorem works on these examples. 



5.1.1 The 3D-geom, the Celebrities and the ABC Database 

The first 3 databases are illustrated in Figure [TJ In the 3D-geom test s m s were random variables uniformly dis- 
tributed on 3-dimensional geometric forms {d = 3). We chose 6 different components (M = 6) and, as a result, 
the dimension of the hidden source s is D s = 18. The celebrities test has 2-dimensional source components gen- 
erated from cartoons of celebrities (d = 2)0 Sources s m were generated by sampling 2-dimensional coordinates 
proportional to the corresponding pixel intensities. In other words, 2-dimensional images of celebrities were 
considered as density functions. M = 10 was chosen. In the ABC database, hidden sources s m were uniform 
distributions defined by 2-dimensional images (d = 2) of the English alphabet. The number of components 
varied as M = 2, 5, 10, 15, and thus the dimension of the source D s was 4, 10, 20, 30, respectively. 

5.1.2 The all-k-independent Database 

The d-dimensional hidden components u := s m were created as follows: coordinates Ui{t) {i = 1, . . . , k) were 
uniform random variables on the set {0,. . . ,k-l}, whereas Uk+i was set to mod(ui + . . . + Uk,k). In this 
construction, every fe-element subset of {ui, . . . , Mfc+i} is made of independent variables. This database is 
called the all-k-independent problem [14|, 135] . In our simulations d = k + 1 was set to 3 or 4 and we used 2 
components (M = 2). Thus, source dimension D s was either 6 or 8. 



See http://www.smileyworld.com 
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(b) 

Figure 1: Illustration of the 3D-geom, celebrities and ABC databases, (a): database 3D-geom, 6 3-dimensional 
components (M = 6, d = 3). Hidden sources are uniformly distributed variables on 3-dimensional geometric 
objects, (b): database celebrities. Density functions of the hidden sources are proportional to the pixel 
intensities of the 2-dimensional images {d = 2). Number of hidden components: M = 10. (c): database ABC. 
Here, the hidden sources s m are uniformly distributed on images {d = 2) of letters. Number of components M 
varies between 2 (A and B) and 15 (A-O). 



5.1.3 The Beatles Database 

Our Beatles test is a non-i.i.d. example. Here, hidden sources are stereo Beatles songs 1^1 8 kHz sampled 
portions of two songs (A Hard Day's Night, Can't Buy Me Love) made the hidden s m s. Thus, the dimension 
of the components d was 2, the number of the components M was 2, and the dimension of the problem D s was 
4. 



5.2 Normalized Amari-error, the Amari-index 



We have shown in Section [3. II that the uBSSD task can be reduced to an ISA task. Consequently, we use ISA 
performance measure to evaluate our algorithms. Assume that there are M pieces of cf-dimensional hidden 
components in the ISA task, A is the mixing matrix, and W is the estimated demixing matrix. Then optimal 
estimation provides matrix G := WA, a block-permutation matrix made of d x d sized blocks. Let matrix 



G G 



be decomposed into d x d blocks: G 



values of the elements of matrix G J,jf £ M 
adapted to the ISA task [15] defined as: 



ixd 



[G 



i,j=l,...,M' 

We used the normalized version 



Let <7* ,J denote the sum of the absolute 
331 of the Amari-error [55] 



r(G) 



1 



2M(M - 1) 




max,- g l i 



M 



+ E 



maxj g 



We refer to the normalized Amari-error as the Amari-index. One can see that < r(G) < 1 for any matrix 
G, and r(G) = if and only if G is a block-permutation matrix with d x d sized blocks. Normalization is 
advantageous; we can compare the precision of ISA procedures and procedures, which are reduced to ISA tasks. 



5.3 Simulations 

Results on databases 3D-geom, celebrities, ABC, all-k -independent and Beatles are provided here. These 
experimental studies have two main parts: 

11 See http://rock.mididb.com/beatles/ 
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L = 1 


L = 2 


L = 3 


L = 4 


L = 5 


3D-geom 


0.25% (±0.01) 


0.27% (±0.03) 


0.28% (±0.02) 


0.29% (±0.03) 


0.29% (±0.01) 


celebrities 


0.37% (±0.01) 


0.38% (±0.01) 


0.39% (±0.01) 


0.39% (±0.01) 


0.40% (±0.01) 



Table 3: The Amari-index of the JFD method for database 3D-geom and celebrities, for different convolution 
lengths: average ± deviation. Number of samples: T = 100,000. For other sample numbers between 1,000 < 
T < 100,000 see Figure El 

1. The efficiency of the JFD method on the uBSSD task is demonstrated in Section [5.3.11 

2. The derived KCCA, KGV kernel-ISA methods were tested on ISA tasks. We show examples in which 
these methods are favorable over the JFD method in Section 15.3.21 

In both cases the tasks are either ISA tasks or can be reduced to ISA (Section 13. ip . Thus, we used the Amari- 
index (Section I5.2j) to measure and compare the performance of the different methods. For each individual 
parameter, 50 random runs were averaged. Our parameters are: T, the sample number of observations x(i), 
L, the parameter of the length of the convolution (the length of the convolution is L + 1), M, the number of 
the components, and d, the dimension of the components, depending on the test. Random run means random 
choice of quantities H[z] and s. 

5.3.1 JFD on uBSSD 

Our results concerning the uBSSD task are delineated. As we showed in Section [3TT| the temporal concatenation 
can turn the uBSSD task into an ISA problem. These ISA tasks associated with simple uBSSD problems can 
easily become more than 100-dimensional. Earlier ISA methods cannot deal with such 'high dimensional' 
problems. This is why we resorted to the recent JFD method (Section I4.1.1h . which seemed to be efficient in 
solving such large problems under the following circumstances: Equation Q implies that the dimension -Disa 
of the derived ISA task with fixed L and D s decreases provided that the difference D x — D s > 1 increases. This 
coincides with our experiences: the higher this difference is, the smaller number of samples can reach the same 
precision. Below, studies for D x — D s = D s (D x = 2D S ) are presented. This choice was amenable to the JFD 
method o In this case the dimension of the ISA task in (|7|) simplifies to the form 

Asa = 2D S L. 

The JFD technique works with the pseudocode given in Table [2j it reduces the uBSSD task to the ISA 
task, where the fastICA algorithm [56] was chosen to perform the ICA computation. In the JFD cost, we chose 
manifold 3" as 3" := {u — > cos(u),u — > cos(2u)}, where the functions operated on the coordinates separately 
|17j . For the 'observations', the elements of mixing matrices in Equation ([I]) were generated independently 
from standard normal distributions P^l 

We studied the dependence of the precision versus the sample number on databases 3D-geom and celebrities. 
The dimension and the number of the components were d = 3 and M = 6 for the 3D-geom database and d = 2 
and M = 10 for the celebrities database, respectively. In both cases the sample number T varied between 1, 000 
and 100, 000. The parameter of the length of the convolution took L = 1, . . . , 5 values. Thus, the length of the 
convolution changed between 2 and 6. Our results are summarized in Figure [2j The values of the errors are 
given in Table O The number of sweeps needed to optimize the permutations after performing ICA is provided 
in Figure El Figures H] and [5] illustrate the estimations of the JFD technique on the 3D-geom and the celebrities 
databases, respectively. 

12 We note that the hardest D x — D s = 1 task is also feasible. However, the sample number necessary to find the solution grows 
considerably, as can be expected from {5). 

13 Uniform distribution on [0.1], instead of normal distribution, showed similar performance. 
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uBSSD: 3D-geom (JFD) 
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Figure 2: Estimation error of the JFD method on the 3D-geom and celebrities databases: Amari-index as a 
function of sample number on log-log scale for different convolution lengths, (a): 3D-geom, (b): celebrities 
database. Dimension of the ISA task: -Disa- For further information, see Tabled 
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uBSSD: 3D-geom (JFD) 



I mm 
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(a) 
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uBSSD: celebrities (JFD) 




n 



(b) 



Figure 3: Number of sweeps in permutation search needed for the JFD method as a function of the convolution 
length, (a): 3D-geom, (b): celebrities database. Black: minimum, gray: average, light gray: maximum. 



16 



% m m 

Hi' Hi- 



(a) 





(c) 



Figure 4: Illustration of the JFD method on the uBSSD task for the 3D-geom database. Sample number 
T = 100,000, convolution length L = 1, Amari-index: 0.25%. (a): observed convolved signals x(i). (b) 
Hinton-diagram: the product of the mixing matrix of the derived ISA task and the estimated demixing matrix 
(= approximately block-permutation matrix with 3x3 blocks), (c): estimated components. Note: hidden 
components are recovered L + V = 2 times, up to permutation and orthogonal transformation. 




(a) (b) (c) 

Figure 5: Illustration of the JFD method on the uBSSD task for the celebrities database. Sample number 
T = 100,000, convolution length L = 1, Amari-index: 0.37%. (a): observed convolved signals x(i). (b) 
Hinton-diagram: the product of the mixing matrix of the derived ISA task and the estimated demixing matrix 
(= approximately block-permutation matrix with 2x2 blocks), (c): estimated components. Note: hidden 
components are recovered L + L' = 2 times, up to permutation and orthogonal transformation. 
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L = 1 


L = 2 


L = 5 


L = 10 


L = 20 


L = 30 


0.41% (±0.06) 


0.44% (±0.05) 


0.46% (±0.05) 


0.47% (±0.03) 


0.66% (±0.13) 


0.70% (±0.11) 



Table 4: Amari-index of the JFD method for ABC database for different convolution lengths: average ± 
deviation. Number of samples: T = 75,000. For other sample numbers between 1,000 < T < 75,000 see 
Figure [6{a). 




Figure 6: (a): Amari-index of the JFD method on the ABC database as a function of sample number and 
for different convolution lengths on log-log scale, (b): Number of sweeps of permutation optimization on the 
derived ISA task as a function of convolution length. Dimension of the ISA task: Z?isa- Black: minimum, 
gray: average, light gray: maximum. For further information, see Table HI 



Figure [2] demonstrates that the JFD algorithm was able to uncover the hidden components with high 
precision. The precision of the estimations shows similar characteristics on the 3D-geom and the celebrities 
databases. The Amari-index is approximately constant for small sample numbers. For each curve, above a 
certain threshold the Amari-index decreases suddenly and after the sudden decrease the precision follows a 
power law r(T) cx T~ c (c > 0). The power law decline is manifested by straight line on log-log scale. The 
slopes of these straight lines are very close to one another. The number of sweeps was between 2 and 11 (2 and 
8) for the 3D-geom (celebrities) tests over all sample numbers, for 1 < L < 5 and for 50 random initializations. 
According to Table El the Amari-index for sample number T = 100, 000 is below 1% (0.25 — 0.40%) with small 
standard deviations (0.01 — 0.03). 

In another test the ABC database was used. The number and the dimensions of the components were 
minimal (d = 2, M = 2) and the dependence on the convolution length was tested. Parameter L took values on 
1, 2, 5, 10, 20, 30. The number of observations varied between 1, 000 <T< 75, 000. The Amari-index and the 
sweep number of the optimization are illustrated in Figure El Precise values of the Amari-index are provided 
in Tabled 

According to Figure [6l the JFD method found the hidden components. The 'power law' decline of the 
Amari-index, which was apparent in the 3D-geom and the celebrities databases, appears for the ABC test, 
too. The figure indicates that for 75, 000 samples and for L = 30 (convolution length is 31) the problem is still 
amenable to the JFD technique. The number of sweeps required for the optimization of the permutations was 
between 1 and 8 for all sample numbers 1,000 < T < 75,000, parameters 1 < L < 30 and for all 50 random 
initializations. According to Table (H for sample number T = 75, 000 the Amari-index stays below 1% on 
average (0.41 — 0.7%) and has a small (0.03 — 0.11) standard deviation. 

In the case of Beatles database, test parameters were similar to those of the ABC database: the number 
and the dimensions of the components were minimal (d = 2, M = 2) and the dependence on the convolution 
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Figure 7: (a): Amari- index of the JFD method on the Beatles database as a function of sample number and 
for different convolution lengths on log-log scale, (b): Number of sweeps of permutation optimization on the 
derived ISA task as a function of convolution length. Dimension of the ISA task: Z?isa- Black: minimum, 
gray: average, light gray: maximum. For further information, see Table [5j 




(a) (b) (c) 

Figure 8: Hinton-diagrams of the JFD methods on the Beatles database for different convolution parameters, 
(a): L = l (Asa = 8), (b): L = 2 (D 1S a = 16), (c): L = 5 (Asa = 40). In the ideal case, 2 pieces of Asa/2- 
dimensional blocks are formed by multiplying the mixing matrix of the derived ISA task and the estimated 
demixing matrix. 



length was tested. Parameter L took values on 1,2,5, 10,20,30. The number of observations varied between 
1, 000 <T< 75, 000. The Amari-index and the sweep number of the optimization are illustrated in Figure [71 
Precise values of the Amari-index are provided in Table G3 The Hinton-diagrams are in Figure El 

The Beatles songs are non-i.i.d sources and subsequent samples s m (t) and s m (t — 1) have dependencies. 
Thus, in S(t) the components of the ((6j) model that belong to the same song can not be distinguished. In the 
ideal case, however, the songs can be separated, that is, the separation of the 2 pieces of (DisA/2)-dimensional 
song-subspaces is possible. We measure the appearance of the 2 of (Z?igA/2)-dimensional blocks for the Beatles 
songs by means of the Amari-index. The results, which demonstrate the success of our method, are shown 
in Figure [7J It can be seen that the JFD method found the hidden components, for 50 and 75 thousand 
samples for L = 30 (convolution length is 31) too. The number of sweeps required for the optimization of the 
permutations was between 1 and 6 for all sample numbers 1,000 < T < 75,000, parameters 1 < L < 30 and 
for all 50 random initializations. According to Table El for sample number T = 75, 000 the Amari-index is 
between 1.07% (L = 1) and 4.43% (L = 30) on the average and has a small (0.04 — 0.09) standard deviation. 
Separation of the 2 of -Disa/2 dimensional subspaces are illustrated in Figure [8] through the Hinton-diagrams. 
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L = 1 


L = 2 


L = 5 


L = 10 


L = 20 


L = 30 


1.07% (±0.04) 


1.43% (±0.09) 


2.29% (±0.07) 


3.31% (±0.06) 


4.03% (±0.06) 


4.43% (±0.04) 



Table 5: Amari-index of the JFD method for Beatles database for different convolution lengths: average ± 
deviation. Number of samples: T = 75,000. For other sample numbers between 1,000 < T < 75,000 see 
Figure [7{a). 





M = 2 


M = 5 


M = 10 


M = 15 


KCCA 


1.33% (±0.48) 


1.20% (±0.17) 


2.76% (±2.86) 


3.00% (±2.21) 


KGV 


1.26% (±0.54) 


1.18% (±0.17) 


1.51% (±0.31) 


1.54% (±0.34) 



Table 6: Amari-index for the KCCA and the KGV methods for database ABC, for different component number 
M: average ± deviation. Number of samples: T = 5, 000. For other sample numbers between 100 < T < 5, 000, 
see Figure [U 

5.3.2 Kernel-ISA Techniques 

We study the efficiency of the KCCA and KGV kernel-ISA methods of Section ^. 1.2i The kernel-ISA techniques 
was found to have a higher computational burden, but they also have advantages compared with the JFD 
technique for ISA tasks. 

For the KCCA and KGV methods we also applied the pseudocode of Table [2j ICA was executed by the 
fastICA algorithm [56]. The Gaussian kernel A:(u, v) oc exp ^ was chosen for both the KCCA and KGV 

methods and parameter a was set to 5. In the KCCA method, regularization parameter k = 10 -4 was applied. 
In the experiments, parameters (a, k) were proved to be reasonably robust. Mixing matrix A was generated 
randomly from the orthogonal group and the sample number was chosen from the interval 100 < T < 5,000. 

Our first ISA example concerns the ABC database. The dimension of a component was d = 2 and the 
number of the components M took different values (M = 2,5, 10, 15). Precision of the estimations is shown 
in Figure where the precision of the JFD method on the same database is also depicted. The number of 
sweeps required for the optimization of the permutations is shown in Figure [TH] for different sample numbers 
and for different component numbers. The data are averaged over 50 random estimations. Figure [11] depicts 
the KCCA estimation for the ABC database. 

Figure O shows that the KCCA and KGV kernel-ISA methods give rise to high precision estimations on the 
ABC database even for small sample numbers. The KGV method was more precise for all M values studied 
than the JFD method. The ratio of precisions could be as high as 4 (see sample number 500). The KCCA 
method is somewhat weaker. For smaller tasks (M = 2 and 5) and for small sample numbers it also exceeds 
the precision of the JFD method. Precision of the method become about the same for higher sample numbers 
and larger tasks. Sweep numbers of the KCCA (KGV) method were between 2 and 8 (2 and 6) (Figure [TO]) . 
Note that one sweep is always necessary for our procedure (Table [2]). A single sweep may be satisfactory if — by 
chance — the ICA provides the correct permutation. 

The other illustration concerns the all-k-independent database. This test can be difficult for ISA methods 
[35] . Number of components M was 2. For k = 2,3, when the dimension of the components d = 3 and 
4, respectively, the KCCA and KGV kernel-ISA methods efficiently estimated the hidden components. The 
precision of the KCCA and KGV estimations as well as the comparison with the JFD method are shown in 
Figure fl~2l The average number of sweeps required for the optimization of the permutations for different k 
values and for 50 randomly initialized computations is provided in Figure [T3l The values of the Amari-indices 
are shown in Table [7] 

According to Figure [12] the two kernel-based methods exhibit similar precision. Both were superior to the 
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Figure 9: (a) and (b): Amari-index of the KCCA and KGV methods, respectively, for the ABC database as a 
function of sample number and for different numbers of components M. (c) and (d): Amari-index of the JFD 
method is divided by the Amari-index of the KCCA method and the KGV technique, respectively. For values 
larger (smaller) than 1 the kernel-ISA method is better (worse) than the JFD method. 
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(a) (b) 

Figure 10: Number of sweeps for the KCCA and the KGV methods needed for the optimization of permutations 
as a function of component number M on the ABC database, (a): KCCA method, (b): KGV method. Black: 
minimum, gray: mean, light gray: maximum. 
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(a) 

Figure 11: Illustration of the KCCA method for the ABC database. Sample number: T = 5, 000. (a): observed 
mixed signals x(t). (b) Hinton-diagram: the product of the mixing matrix of the derived ISA task and the 
estimated demixing matrix (= approximately block-permutation matrix), (c): estimated components. Hidden 
components are recovered up to permutation and orthogonal transformation. 





k = 2 


k = 3 


KCCA/KGV 


0.0017% (±0.0014) 


0.16% (±0.04) 



Table 7: The Amari-index of the KCCA and KGV methods for database all-k-independent for different k values: 
average ± deviation. Number of samples: T = 5, 000. For other sample numbers between 100 < T < 5, 000, 
see Figure H2J 

JFD technique. The ratio of the Amari-indices for sample number 5,000 and for k = 2 is more than 15,000, 
for k = 3 it is more than 500. For details concerning the Amari-indices, see Table [71 These indices are close 
to each other for the KCCA and the KGV methods: 0.0017% for k = 2, 0.16% for k = 3 on average. Both 
kernel-ISA methods used 2 — 3 sweeps for the optimization of the permutations (Figure fl~3j) . 



6 Conclusions 

We have introduced a new model, the blind subspace deconvolution (BSSD) for data analysis. This model 
deals with the casual convolutive mixture of multidimensional independent sources. The undercomplete version 
(uBSSD) of the task has been presented, and it has been shown how to derive an independent subspace analysis 
(ISA) task from the uBSSD problem. Recent developments of the ISA techniques enabled us to handle the 
emerging high dimensional problems. Our earlier results, namely the ISA Separation Theorem [33] motivated 
us to reduce the ISA task to the search for the optimal permutation of the ICA components. The components 
were grouped with a novel joint decorrelation technique, the joint f-decorrelation (JFD) method |17j . 

Also, we adapted other ICA techniques, such as the KCCA and KGV methods to the ISA task and studied 
their efficiency Simulations indicated that although the KCCA and KGV methods give rise to serious compu- 
tational burden relative to the JFD method, they can be advantageous for smaller ISA tasks and for ISA tasks 
when the number of samples is small. 

Finally, we note that we achieved small errors in these high dimensional computations. These small errors 
indicate that the Separation Theorem is robust and might be extended to a larger class of noise sources. 
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Figure 12: (a) and (b): Amari-indices of the KCCA and KGV methods, respectively, as a function of the 
sample number and for k = 2 and 3 on the all-k-independent database. For more details, see Table [71 (c): ratio 
of Amari-indices of JFD and KCCA methods, (d): ratio of Amari-indices of JFD and KGV methods. 
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Figure 13: Number of sweeps of permutation optimization for the KCCA (a) and KGV (b) methods for the 
all-k-independent database and for different k values. Black: minimum, gray: mean, light gray: maximum. 
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A Proof of the ISA Separation Theorem 

We shall rely on entropy inequalities (Section IA.1|) . In Section IA.2I connection to the ICA cost function is 
derived (Lemma [2]) . The ISA Separation Theorem then follows. 

A.l EPI-type Relations 

First, consider the so-called entropy power inequality (EPI) 

L 

i=l 

where u\, . . . ,ul £R denote continuous random variables. This inequality holds, for example, for independent 
continuous variables |39|. 

If EPI is satisfied on the surface of the L-dimensional unit sphere § L , then a further inequality holds: 

Lemma 1 Suppose that continuous random variables u\, . . . ,ul £K satisfy the following inequality 

L 

e 2ff(Ef=i«w) > J2e 2H(WtUt) y™ G § L - (16) 

8=1 

This inequality will be called the w-EPI condition. Then Equation (J8j) holds, too. 

Proof Assume that w£§ 1 . Applying In on condition (|16p . and using the monotonicity of the In function, we 
can see that the first inequality is valid in the following inequality chain 

>X;^ln(e 2 ^))=2X;^(n J ). 

i=l " i=l 

Here, 

1. we used the relation 

H(wiUi) = H(ui) + ln (\wi\) 
for the entropy of the transformed variable ]39^ . Hence 

e 2H(w lUl ) = e 2Jf(« j )+21n(|w i |) = & 2H ( Ui ) e 2\n{\wi\) = e 2H{u,) w 2_ 

2. In the second inequality, the concavity o/ln was exploited. □ 

Note 5 w-EPI holds, for example, for independent variables U{, because independence is not affected by mul- 
tiplication with a constant. 
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A. 2 Connection to the Cost Function of the ICA Task 

The ISA Separation Theorem will be a corollary of the following claim: 



Lemma 2 Let y = [y 1 ; . . . ;y M ] = y(W) = Ws G R D , where W G O d (D = Y,rn=i d m), Y m G R dm is the 
estimation of the m th component of the ISA task. Let y™ G R be the i th coordinate of the m th component 
(2 = 1,... , d m J- Similarly, let sf 1 G R stand /or tae coordinate of the m th source. Let us assume that the 
sources satisfy the condition (jHJ). Then 



u := s m G R d 



M d v 



M d„ 



(17) 



m=l i=l 



m=l i=l 



Proof Lei us denote the (i,j) th element of matrix W by W% j. Coordinates of y and s toZ/ be denoted by 
yi and s,, respectively. Further, let 9 m denote the indices of the m th subspace (m = 1, ...,M), that is, 
9 m := {1 + Yli=i di, . . . , YliLi d-i} (do '■= 0). Now, writing the elements of the i th row of matrix multiplication 
y = Ws, we have 

Vi=J2 W ^ + • • • + E W <**i ( 18 ) 
jes 1 jeS M 



and thus, 



M 



H(y t )=H\ E E W ^ s 

>m=lj£S m 



H 



M 

E 

m=l 



Y. w t 

v/GS m 



M 
m=l 
M 

= E 

m=l 



M 

^E 

m=l 



E w i\ H 

JeS m 



E 



E K E 




(19) 
(20) 

(21) 

(22) 

(23) 
(24) 



E ir w /7 ^ • ••• • E WfjHte). 

jes 1 jeS M 

The above steps can be justified as follows: 

1. (jl9p : Equation (|18p was inserted into the argument of H. 

2. (|20p : iVey; terms were added for Lemma [IJ 

5. (|2ip : Sources s m are independent of one another and this independence is preserved upon mixing within 
the subspaces, and we could also use LemmaUl because W is an orthogonal matrix. 
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4- (|22j) : Nominators were transferred into the J^ - terms. 

5. (|23f) / Variables s m satisfy condition ([8|) according to our assumptions. 

6. (|24p .- We simplified the expression after squaring. 



Using this inequality, summing it fori, exchanging the order of the sums, and making use of the orthogonality 
of matrix W, we have 



D D 



E * E E + • • • + E ^ to) 

i=i i=i yjes 1 je3 M 

e(xx,W)+---+e (xx;V( 

j 6 ql \i=l / ,- e gM \i=l / 



jeS 1 Vi=l / jes^ 1 ' 

D 

Corollary (ISA Separation Theorem) ICA minimizes the l.h.s. of Equation (I17|) . i/tai is, it minimizes 
Ylm=l ^2i=l H (ff 1 )- ^ e se ^ °/ ^rama is invariant to permutations and to changes of the signs. Also, 
according to Proposition^ {s™}, that is, the coordinates of the s m components of the ISA task belong to the 
set of the minima. □ 



B Kernel Covariance Technique for the ISA Task 

For the sake of completeness, the extension of the KC method [35] for the ISA task is detailed below. The 
extension is similar to the extensions presented in Section 14.1.21 (see the KCCA method), and we use the 
notations of that section. 

First, we would like to measure the dependence of two 2 random variables u £ M dl and v € M^ 2 . The KC 
technique defines their dependence as their maximal covariance on the unit spheres S u , S v of function spaces 
3™, 3^: 



J KC (u,v,J u ,3- v ) := sup \E{[g(u) - Eg(u)][h(v) - Eh(v)]}\. 
This function Jkc can be estimated empirically from T-element samples ui, . . . , € M. dl , vi, 

T 



J^(u,v,J u ,J v ):= sup 



u t )-g][h(v t )-h] 



t=l 



The estimation can be reduced to the following conditional maximization problem: 



sup_ Cl K u K v c 2 . 

c*K u ci<l,c^K v c 2 <l 



(25) 



After the adaptation of the Lagrange multiplier technique and the computation of the stationary points of 
(j25[) it can be realized that the values of [ci;c2] and J^q P can be computed as the solutions of the generalized 
eigenvalue problem 



K^K V 
K V K U K v 



A 



K u 
K v 



(26) 
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r M 



If the task is to measure the dependence between more than two random variables y 1 € 



then (|26p is to be replaced with the following generalized eigenvalue problem: 



K 2 K! 



\k m k 1 k m k 



K]K 2 
K 2 



K'K M \ f Cl \ 
K 2 K M c 2 



K M ) 



\cmJ 



/K 1 
K 2 

V 



\ fc x \ 

C2 





K M / 



\cm/ 



Using the maximal eigenvalue of this problem, Jkc can be estimated. 
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